Performance database: capturing data for optimizing distributed streaming workflows.
نویسندگان
چکیده
The performance database (PDB) stores performance-related data gathered during workflow enactment. We argue that, by carefully understanding and manipulating these data, we can improve efficiency when enacting workflows. This paper describes the rationale behind the PDB, and proposes a systematic way to implement it. The prototype is built as part of the Advanced Data Mining and Integration Research for Europe project. We use workflows from real-world experiments to demonstrate the usage of PDB.
منابع مشابه
Performance Database: Capturing Data for Optimising Distributed Streaming Workflows
It is evident that data-intensive research is transforming the computing landscape, as recognised in “The Fourth Paradigm” [1]. Due to the scale, complexity and heterogeneity of data gathered in scientific experiments, we can not naively dumping the data into computing resources and hoping to extract useful information and knowledge through exhaustive and unstructured computations. To survive t...
متن کاملA Compiler Toolchain for Distributed Data Intensive Scientific Workflows
by Peter Bui With the growing amount of computational resources available to researchers today and the explosion of scientific data in modern research, it is imperative that scientists be able to construct data processing applications that harness these vast computing systems. To address this need, I propose applying concepts from traditional compilers, linkers, and profilers to the constructio...
متن کاملComplexity Analysis and Performance Optimization of Distributed Computing Workflows: From Theory to Practice
The advance of supercomputing technology is expediting the transition in various basic and applied sciences from traditional laboratory-controlled experimental methodologies to modern computational paradigms involving complex numerical model analyses and extreme-scale simulations. These computationbased simulations and analyses have become an essential research and discovery tool in next-genera...
متن کاملOptimizing Query Processing in Batch Streaming System
With the growing need of processing “big data” in real time, modern streaming processing systems should be able to operate at the cloud scale. This imposes challenges to building large scale stream processing systems. First, processing tasks should be efficiently distributed to worker nodes with small overhead. Second, streaming data processing should be highly available, despite that failures ...
متن کاملA Model for User-Oriented Data Provenance in Pipelined Scientific Workflows
Integrated provenance support promises to be a chief advantage of scientific workflow systems over script-based alternatives. While it is often recognized that information gathered during scientific workflow execution can be used automatically to increase fault tolerance (via checkpointing) and to optimize performance (by reusing intermediate data products in future runs), it is perhaps more si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
دوره 369 1949 شماره
صفحات -
تاریخ انتشار 2011